Objective

The objective of this project is to implement data cleaning, visualizaiton and analysis on a real set of data and to familiarize ourselves with the process.

Data Selection

For this project, I use the “County-level Oil and Gas Production in the U.S.” as my data set. A brief preview of the data set is shown as the following:

data1<-read.xls("oilgascounty.xls", sheet=1)
head(data1)
##   FIPS geoid Stabr    County_Name Rural_Urban_Continuum_Code_2013
## 1 1001  1001    AL Autauga County                               2
## 2 1003  1003    AL Baldwin County                               3
## 3 1005  1005    AL Barbour County                               6
## 4 1007  1007    AL    Bibb County                               1
## 5 1009  1009    AL  Blount County                               1
## 6 1011  1011    AL Bullock County                               6
##   Urban_Influence_2013 Metro_Nonmetro_2013 Metro_Micro_Noncore_2013
## 1                    2                   1                        2
## 2                    2                   1                        2
## 3                    6                   0                        0
## 4                    1                   1                        2
## 5                    1                   1                        2
## 6                    6                   0                        0
##   oil2000 oil2001 oil2002 oil2003 oil2004 oil2005 oil2006 oil2007 oil2008
## 1       0       0       0       0       0       0       0       0       0
## 2 138,072 134,666 138,011 127,985 130,763 118,043 103,992 112,303  97,623
## 3       0       0       0       0       0       0       0       0       0
## 4       0       0       0       0       0       0       0       0       0
## 5       0       0       0       0       0       0       0       0       0
## 6       0       0       0       0       0       0       0       0       0
##   oil2009 oil2010 oil2011    gas2000    gas2001     gas2002     gas2003
## 1       0       0       0          0          0           0           0
## 2  84,982 101,955  94,638 72,543,902 98,699,994 107,142,655 101,510,068
## 3       0       0       0          0          0           0           0
## 4       0       0       0          0          0           0           0
## 5       0       0       0          0          0           0           0
## 6       0       0       0          0          0           0           0
##      gas2004    gas2005    gas2006    gas2007    gas2008    gas2009
## 1          0          0          0          0          0          0
## 2 90,146,850 84,536,875 83,951,640 82,876,786 78,547,145 68,525,628
## 3          0          0          0          0          0          0
## 4          0      8,301     98,853    480,015    684,143    551,719
## 5          0          0          0          0     20,516     61,054
## 6          0          0          0          0          0          0
##      gas2010    gas2011 oil_change_group gas_change_group
## 1          0          0       Status Quo       Status Quo
## 2 63,069,025 51,041,072       Status Quo        H_Decline
## 3          0          0       Status Quo       Status Quo
## 4    453,132    400,504       Status Quo       Status Quo
## 5      3,594     21,496       Status Quo       Status Quo
## 6          0          0       Status Quo       Status Quo
##   oil_gas_change_group
## 1           Status Quo
## 2            H_Decline
## 3           Status Quo
## 4           Status Quo
## 5           Status Quo
## 6           Status Quo
dim(data1)
## [1] 3109   35

As shown in the preview, the data set is relatively messy and there are several things that can be tidied up. For example, redundancies exist in the data set like FIPS and geoid—they are essentially representing the same thing; in addition, data are not organized by year, which makes it hard to manipulate the data of a specific year.

Data Cleaning

In order to prepare the data for visualization and analysis, I perform a series of steps to clean the data. First of all, I performed a common routine to tidy up the basics redundancies. The first step will be to getting rid of columns without values:

data2 <- data1[(colSums(is.na(data1)) != nrow(data1))]
head(data2)
##   FIPS geoid Stabr    County_Name Rural_Urban_Continuum_Code_2013
## 1 1001  1001    AL Autauga County                               2
## 2 1003  1003    AL Baldwin County                               3
## 3 1005  1005    AL Barbour County                               6
## 4 1007  1007    AL    Bibb County                               1
## 5 1009  1009    AL  Blount County                               1
## 6 1011  1011    AL Bullock County                               6
##   Urban_Influence_2013 Metro_Nonmetro_2013 Metro_Micro_Noncore_2013
## 1                    2                   1                        2
## 2                    2                   1                        2
## 3                    6                   0                        0
## 4                    1                   1                        2
## 5                    1                   1                        2
## 6                    6                   0                        0
##   oil2000 oil2001 oil2002 oil2003 oil2004 oil2005 oil2006 oil2007 oil2008
## 1       0       0       0       0       0       0       0       0       0
## 2 138,072 134,666 138,011 127,985 130,763 118,043 103,992 112,303  97,623
## 3       0       0       0       0       0       0       0       0       0
## 4       0       0       0       0       0       0       0       0       0
## 5       0       0       0       0       0       0       0       0       0
## 6       0       0       0       0       0       0       0       0       0
##   oil2009 oil2010 oil2011    gas2000    gas2001     gas2002     gas2003
## 1       0       0       0          0          0           0           0
## 2  84,982 101,955  94,638 72,543,902 98,699,994 107,142,655 101,510,068
## 3       0       0       0          0          0           0           0
## 4       0       0       0          0          0           0           0
## 5       0       0       0          0          0           0           0
## 6       0       0       0          0          0           0           0
##      gas2004    gas2005    gas2006    gas2007    gas2008    gas2009
## 1          0          0          0          0          0          0
## 2 90,146,850 84,536,875 83,951,640 82,876,786 78,547,145 68,525,628
## 3          0          0          0          0          0          0
## 4          0      8,301     98,853    480,015    684,143    551,719
## 5          0          0          0          0     20,516     61,054
## 6          0          0          0          0          0          0
##      gas2010    gas2011 oil_change_group gas_change_group
## 1          0          0       Status Quo       Status Quo
## 2 63,069,025 51,041,072       Status Quo        H_Decline
## 3          0          0       Status Quo       Status Quo
## 4    453,132    400,504       Status Quo       Status Quo
## 5      3,594     21,496       Status Quo       Status Quo
## 6          0          0       Status Quo       Status Quo
##   oil_gas_change_group
## 1           Status Quo
## 2            H_Decline
## 3           Status Quo
## 4           Status Quo
## 5           Status Quo
## 6           Status Quo
dim(data2)
## [1] 3109   35

From the preview and the dimension of the data before and after cleaning, one can see that there is no column without values.

Then, I delete the cols that only have one value:

data3<-Filter(function(x)(length(unique(x))>1), data2)
head(data3)
##   FIPS geoid Stabr    County_Name Rural_Urban_Continuum_Code_2013
## 1 1001  1001    AL Autauga County                               2
## 2 1003  1003    AL Baldwin County                               3
## 3 1005  1005    AL Barbour County                               6
## 4 1007  1007    AL    Bibb County                               1
## 5 1009  1009    AL  Blount County                               1
## 6 1011  1011    AL Bullock County                               6
##   Urban_Influence_2013 Metro_Nonmetro_2013 Metro_Micro_Noncore_2013
## 1                    2                   1                        2
## 2                    2                   1                        2
## 3                    6                   0                        0
## 4                    1                   1                        2
## 5                    1                   1                        2
## 6                    6                   0                        0
##   oil2000 oil2001 oil2002 oil2003 oil2004 oil2005 oil2006 oil2007 oil2008
## 1       0       0       0       0       0       0       0       0       0
## 2 138,072 134,666 138,011 127,985 130,763 118,043 103,992 112,303  97,623
## 3       0       0       0       0       0       0       0       0       0
## 4       0       0       0       0       0       0       0       0       0
## 5       0       0       0       0       0       0       0       0       0
## 6       0       0       0       0       0       0       0       0       0
##   oil2009 oil2010 oil2011    gas2000    gas2001     gas2002     gas2003
## 1       0       0       0          0          0           0           0
## 2  84,982 101,955  94,638 72,543,902 98,699,994 107,142,655 101,510,068
## 3       0       0       0          0          0           0           0
## 4       0       0       0          0          0           0           0
## 5       0       0       0          0          0           0           0
## 6       0       0       0          0          0           0           0
##      gas2004    gas2005    gas2006    gas2007    gas2008    gas2009
## 1          0          0          0          0          0          0
## 2 90,146,850 84,536,875 83,951,640 82,876,786 78,547,145 68,525,628
## 3          0          0          0          0          0          0
## 4          0      8,301     98,853    480,015    684,143    551,719
## 5          0          0          0          0     20,516     61,054
## 6          0          0          0          0          0          0
##      gas2010    gas2011 oil_change_group gas_change_group
## 1          0          0       Status Quo       Status Quo
## 2 63,069,025 51,041,072       Status Quo        H_Decline
## 3          0          0       Status Quo       Status Quo
## 4    453,132    400,504       Status Quo       Status Quo
## 5      3,594     21,496       Status Quo       Status Quo
## 6          0          0       Status Quo       Status Quo
##   oil_gas_change_group
## 1           Status Quo
## 2            H_Decline
## 3           Status Quo
## 4           Status Quo
## 5           Status Quo
## 6           Status Quo
dim(data3)
## [1] 3109   35

I also delete the cols that have the same value:

data4<-t(unique(t(data3)))
data4<-data.frame(data4)
dim(data4)
## [1] 3109   34

To better process the data, I get rid of comma style format(thousands seperator)

data5<-{
  data4 %>%
    mutate_each(funs(as.character(.)), oil2000:gas2011) %>%
    mutate_each(funs(gsub(",", "", .)), oil2000:gas2011) %>%
    mutate_each(funs(as.numeric(.)), oil2000:gas2011)}
head(data5)
##    FIPS Stabr    County_Name Rural_Urban_Continuum_Code_2013
## 1  1001    AL Autauga County                               2
## 2  1003    AL Baldwin County                               3
## 3  1005    AL Barbour County                               6
## 4  1007    AL    Bibb County                               1
## 5  1009    AL  Blount County                               1
## 6  1011    AL Bullock County                               6
##   Urban_Influence_2013 Metro_Nonmetro_2013 Metro_Micro_Noncore_2013
## 1                    2                   1                        2
## 2                    2                   1                        2
## 3                    6                   0                        0
## 4                    1                   1                        2
## 5                    1                   1                        2
## 6                    6                   0                        0
##   oil2000 oil2001 oil2002 oil2003 oil2004 oil2005 oil2006 oil2007 oil2008
## 1       0       0       0       0       0       0       0       0       0
## 2  138072  134666  138011  127985  130763  118043  103992  112303   97623
## 3       0       0       0       0       0       0       0       0       0
## 4       0       0       0       0       0       0       0       0       0
## 5       0       0       0       0       0       0       0       0       0
## 6       0       0       0       0       0       0       0       0       0
##   oil2009 oil2010 oil2011  gas2000  gas2001   gas2002   gas2003  gas2004
## 1       0       0       0        0        0         0         0        0
## 2   84982  101955   94638 72543902 98699994 107142655 101510068 90146850
## 3       0       0       0        0        0         0         0        0
## 4       0       0       0        0        0         0         0        0
## 5       0       0       0        0        0         0         0        0
## 6       0       0       0        0        0         0         0        0
##    gas2005  gas2006  gas2007  gas2008  gas2009  gas2010  gas2011
## 1        0        0        0        0        0        0        0
## 2 84536875 83951640 82876786 78547145 68525628 63069025 51041072
## 3        0        0        0        0        0        0        0
## 4     8301    98853   480015   684143   551719   453132   400504
## 5        0        0        0    20516    61054     3594    21496
## 6        0        0        0        0        0        0        0
##   oil_change_group gas_change_group oil_gas_change_group
## 1       Status Quo       Status Quo           Status Quo
## 2       Status Quo        H_Decline            H_Decline
## 3       Status Quo       Status Quo           Status Quo
## 4       Status Quo       Status Quo           Status Quo
## 5       Status Quo       Status Quo           Status Quo
## 6       Status Quo       Status Quo           Status Quo
dim(data5)
## [1] 3109   34

After the basic cleanings, I continue to tidy up the data by assigning the data types and changing columns to factor as needed.

#factor
data5$Urban_Influence_2013<-as.factor(data5$Urban_Influence_2013)
data5$Rural_Urban_Continuum_Code_2013<-as.factor(data5$Rural_Urban_Continuum_Code_2013)
data5$oil_change_group <- as.factor(data5$oil_change_group) 
data5$gas_change_group <- as.factor(data5$gas_change_group)
data5$oil_gas_change_group <- as.factor(data5$oil_gas_change_group)


data5$County_Name <- gsub("County", "", data5$County_Name)

To further clean the data, I clean the rows whose annual gross withdrawals of curde oil and natural gas equals to zero since I will be analyzing the regions with oil and gas outputs. I then also create a subset “value”, which only contains data of oil and gas withdrawals.

nums <- sapply(data5, is.numeric)
value <- data5[,nums]

x=NULL
for(i in 1:nrow(value)){
if(sum(value[i,])==0){x=c(x,i)}}

data6<-data5[-x,]
head(data6)
##     FIPS Stabr County_Name Rural_Urban_Continuum_Code_2013
## 2   1003    AL    Baldwin                                3
## 4   1007    AL       Bibb                                1
## 5   1009    AL     Blount                                1
## 12  1023    AL    Choctaw                                9
## 13  1025    AL     Clarke                                7
## 18  1035    AL    Conecuh                                7
##    Urban_Influence_2013 Metro_Nonmetro_2013 Metro_Micro_Noncore_2013
## 2                     2                   1                        2
## 4                     1                   1                        2
## 5                     1                   1                        2
## 12                   10                   0                        0
## 13                   11                   0                        0
## 18                   11                   0                        0
##    oil2000 oil2001 oil2002 oil2003 oil2004 oil2005 oil2006 oil2007 oil2008
## 2   138072  134666  138011  127985  130763  118043  103992  112303   97623
## 4        0       0       0       0       0       0       0       0       0
## 5        0       0       0       0       0       0       0       0       0
## 12  652067  603845  564213  527582  463107  424154  363061  429449  442936
## 13  330933  319891  285497  301260  252415  235616  214651  201348  191449
## 18  110132  142044   96223  116122  423467 1207237 1687577 1821208 2531941
##    oil2009 oil2010 oil2011  gas2000  gas2001   gas2002   gas2003  gas2004
## 2    84982  101955   94638 72543902 98699994 107142655 101510068 90146850
## 4        0       0       0        0        0         0         0        0
## 5        0       0       0        0        0         0         0        0
## 12  350345  375372  372512   283457   308097    351183    333110   330232
## 13  183050  169480  170611   100704   102275     85661     79814    88344
## 18 2485039 2352547 3294092   132402   139643     51886    105935   411087
##     gas2005  gas2006  gas2007  gas2008  gas2009  gas2010  gas2011
## 2  84536875 83951640 82876786 78547145 68525628 63069025 51041072
## 4      8301    98853   480015   684143   551719   453132   400504
## 5         0        0        0    20516    61054     3594    21496
## 12   293620   256483   337575   677198   343412   362477   326952
## 13    85342    66184    61460    55108    56843    41448    43093
## 18  1055787  1488366  1612418  1950537  2597540  3040205  4078749
##    oil_change_group gas_change_group oil_gas_change_group
## 2        Status Quo        H_Decline            H_Decline
## 4        Status Quo       Status Quo           Status Quo
## 5        Status Quo       Status Quo           Status Quo
## 12       Status Quo       Status Quo           Status Quo
## 13       Status Quo       Status Quo           Status Quo
## 18         H_Growth         H_Growth             H_Growth
dim(data6)
## [1] 1166   34

As mentioned before, oil and gas productions by year are recorded as individual columns such as “oil2001” and “gas2001”. I tidy it up by extracting the year and reorganzing the data so that oil and gas only have one column respectively.

#data frame (tidy data)
#oil= subset that does not contain gas data
#gas= subset that does not contain oil data
oil<- data.frame(subset(data6, select = -c(gas2000:gas2011)))
gas<- data.frame(subset(data6, select = -c(oil2000:oil2011)))

#reorganizing oil data by year
oil1 <- oil %>% 
  gather(Year, oil, oil2000:oil2011)%>%
  mutate(Year = gsub("oil", "", Year))%>%
  arrange(County_Name, Year)

#reorganzing gas data by year
gas1<- gas%>%
  gather(Year, gas, gas2000:gas2011)%>%
  mutate(Year = gsub("gas", "", Year))%>%
  arrange(County_Name, Year)

#merge two data frame, get tidy data
data_clean<-merge(gas1,oil1)

The final tidy data looks like the following:

##    FIPS Stabr County_Name Rural_Urban_Continuum_Code_2013
## 1  1003    AL    Baldwin                                3
## 2  1003    AL    Baldwin                                3
## 3  1003    AL    Baldwin                                3
## 4  1003    AL    Baldwin                                3
## 5  1003    AL    Baldwin                                3
## 6  1003    AL    Baldwin                                3
##   Urban_Influence_2013 Metro_Nonmetro_2013 Metro_Micro_Noncore_2013
## 1                    2                   1                        2
## 2                    2                   1                        2
## 3                    2                   1                        2
## 4                    2                   1                        2
## 5                    2                   1                        2
## 6                    2                   1                        2
##   oil_change_group gas_change_group oil_gas_change_group Year       gas
## 1       Status Quo        H_Decline            H_Decline 2000  72543902
## 2       Status Quo        H_Decline            H_Decline 2001  98699994
## 3       Status Quo        H_Decline            H_Decline 2002 107142655
## 4       Status Quo        H_Decline            H_Decline 2003 101510068
## 5       Status Quo        H_Decline            H_Decline 2004  90146850
## 6       Status Quo        H_Decline            H_Decline 2005  84536875
##      oil
## 1 138072
## 2 134666
## 3 138011
## 4 127985
## 5 130763
## 6 118043

Data Visualization

With the tidy data, I could do some visualization to get the general idea of the oil and gas production distribution across the U.S. As a start, I make a Choropleth map of the overall oil and gas productions of the counties from 2001 to 2011.

I start first by combining the oil and gas productions of the 10-year period for each county:

oilvalue<-value[,1:(length(value)/2)]
gasvalue<-value[,(length(value)/2+1):length(value)]
oilsum<-data.frame(rowSums(oilvalue))
gassum<-data.frame(rowSums(gasvalue))
colnames(oilsum)[1]<-"value"
colnames(gassum)[1]<-"value"


df <- data1
aa<-data.frame(subset(df,select = c(1,3,4)))
colnames(aa)[1]<-'region'
oilsum<-cbind(aa,oilsum)
gassum<-cbind(aa,gassum)

I then plot the oil production distribution across the US:

oilcho = CountyChoropleth$new(oilsum)
oilcho$title = "Oil Production across US"
oilcho$ggplot_scale = scale_fill_brewer(name="Barrel Produced", palette=2, drop=FALSE)
oilcho$render()

Also the gas prodcution distribution:

gascho = CountyChoropleth$new(gassum)
gascho$title = "Gas Production across US"
gascho$ggplot_scale = scale_fill_brewer(name="Barrel Produced", palette=3, drop=FALSE)
gascho$render()

From the distribution maps, it can be concluded that majority of the oil and gas production comes from the middle part of the U.S.

I also plot the oil and gas production based on different area codes:

#Plot the oil and gas production based on the Metro, Micro, and Noncore statues
plot_ly(data = data_clean, x = ~Year , y = ~oil, color = ~Metro_Micro_Noncore_2013)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
plot_ly(data = data_clean, x = ~Year , y = ~gas, color = ~Metro_Micro_Noncore_2013)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
plot_ly(data = data_clean, x = ~Year , y = ~oil, color = ~Rural_Urban_Continuum_Code_2013)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
plot_ly(data = data_clean, x = ~Year , y = ~gas, color = ~Rural_Urban_Continuum_Code_2013)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar

The mean and standard deviation can be better visualized in the box plots:

#Plot the oil and gas production based on Rural Urban Continuum Code 2013
#
plot_ly(data=data_clean, y = ~oil, color = ~Rural_Urban_Continuum_Code_2013, type = "box")
plot_ly(data=data_clean, y = ~gas, color = ~Rural_Urban_Continuum_Code_2013, type = "box")
plot_ly(data=data_clean, y = ~oil, color = ~Metro_Micro_Noncore_2013, type = "box")
plot_ly(data=data_clean, y = ~gas, color = ~Metro_Micro_Noncore_2013, type = "box")

After seeing the productions distribution among areas, I continue to visualize the data with time series scatter plot to see the production trend over the 12-year period.

#scatter plot (x=year, y=sum(value))
#oil hyperbola trend, gas increasing trend
op<-data.frame(colSums(oilvalue))
op$oil<-op$colSums.oilvalue.
op$Year<- c(2000:2011)
og<-data.frame(colSums(gasvalue))
og$gas<-og$colSums.gasvalue.
og$Year<- c(2000:2011)
plot_ly(op, x=~Year, y=~oil, type = "scatter")
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
plot_ly(og, x=~Year, y=~gas, type = "scatter")
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

Finally we can look at which state produces the most oil and gas

#which state produce the most?
plot_ly(data_clean, x=~ Stabr, y=~ oil)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
plot_ly(data_clean, x=~ Stabr, y=~ gas)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar

Analysis

For analysis, we can start by seeing the summary of tidy data of the oil and gas individually.

#summary 
summary(data_clean$oil)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##         0      1606     72720    982400    507300 208800000
summary(data_clean$gas)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 5.291e+03 6.199e+05 1.549e+07 7.601e+06 1.198e+09

From the plots in the Data Visualization section, it can be seen that there are outliers in the data, making it hard for us to see the distribution among areas. Therefore, I use the following code to remove the outliers in the data set.

#remove outlier
#reference: https://www.r-bloggers.com/identify-describe-plot-and-remove-the-outliers-from-the-dataset/
outlierKD <- function(dt, var) {
  var_name <- eval(substitute(var),eval(dt))
  na1 <- sum(is.na(var_name))
  m1 <- mean(var_name, na.rm = T)
  par(mfrow=c(2, 2), oma=c(0,0,3,0))
  boxplot(var_name, main="With outliers")
  hist(var_name, main="With outliers", xlab=NA, ylab=NA)
  outlier <- boxplot.stats(var_name)$out
  mo <- mean(outlier)
  var_name <- ifelse(var_name %in% outlier, NA, var_name)
  boxplot(var_name, main="Without outliers")
  hist(var_name, main="Without outliers", xlab=NA, ylab=NA)
  title("Outlier Check", outer=TRUE)
  na2 <- sum(is.na(var_name))
  cat("Outliers identified:", na2 - na1)
  m2 <- mean(var_name, na.rm = T)
  dt[as.character(substitute(var))] <- invisible(var_name)
  assign(as.character(as.list(match.call())$dt), dt, envir = .GlobalEnv)
  return(invisible(dt))
}

no_outlier<-data_clean
no_outlier<-outlierKD(no_outlier, oil)

## Outliers identified: 1929
no_outlier<-outlierKD(no_outlier, gas)

## Outliers identified: 2203
no_outlier[no_outlier == 0 & is.numeric(no_outlier)] <- NA

I then use the data set without outlier to replot the plots in the previous section:

#Plot the oil production based on the Metro, Micro, and Noncore statues
plot_ly(data = data_clean, x = ~Year , y = ~oil, color = ~Metro_Micro_Noncore_2013)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
plot_ly(data = no_outlier, x = ~Year , y = ~oil, color = ~Metro_Micro_Noncore_2013)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
plot_ly(data = data_clean, x = ~Year , y = ~gas, color = ~Metro_Micro_Noncore_2013)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
plot_ly(data = no_outlier, x = ~Year , y = ~gas, color = ~Metro_Micro_Noncore_2013)
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#bar
#Plot the oil and gas production based on Rural Urban Continuum Code 2013
#
plot_ly(data=no_outlier, y = ~oil, color = ~Rural_Urban_Continuum_Code_2013, type = "box")
plot_ly(data=no_outlier, y = ~gas, color = ~Rural_Urban_Continuum_Code_2013, type = "box")
plot_ly(data=no_outlier, y = ~oil, color = ~Metro_Micro_Noncore_2013, type = "box")
plot_ly(data=no_outlier, y = ~gas, color = ~Metro_Micro_Noncore_2013, type = "box")